E ect of Data Skewness in Parallel Mining ofAssociation

نویسنده

  • David W. Cheung
چکیده

An eecient parallel algorithm FPM(Fast Parallel Mining) for mining association rules on a shared-nothing parallel system has been proposed. It adopts the count distribution approach and has incorporated two powerful candidate pruning techniques, i.e., distributed pruning and global pruning. It has a simple communication scheme which performs only one round of message exchange in each iteration. We found that the two pruning techniques are very sensitive to data skewness, which describes the degree of non-uniformity of the itemset distribution among the database partitions. Distributed pruning is very eeective when data skewness is high. Global pruning is more eeective than distributed pruning even for the mild data skewness case. We have implemented the algorithm on an IBM SP2 parallel machine. The performance studies connrm our observation on the relationship between the eeectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm 2, 3]. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effect of Data Skewness and Workload Balance in Parallel Data Mining

To mine association rules efficiently, we have developed a new parallel mining algorithm FPM on a distributed share-nothing parallel system in which data are partitioned across the processors. FPM is an enhancement of the FDM algorithm, which we proposed previously for distributed mining of association rules [8]. FPM requires fewer rounds of message exchanges than FDM and hence has a better res...

متن کامل

A Hypotheses-based Method for Identifying Skewed Itemsets

Parallel and distributed association rule mining are very important research subjects, with various work addressing them. Data skewness, which describes the degree of non-uniformity of the itemset distribution among database partitions, causes various problems to parallel and distributed association rule mining algorithms, such as the generation of many false candidate itemsets. However, some a...

متن کامل

Knowledge and attitude of the families with a mental patient about the‏ ‏electroconvulsive therapy at the Ebn-e- Sina psychiatric center of Mashhad

Introduction: Electroconvulsive therapy is the most common and also unique method in ‎psychiatry that is used for treatment of many mental disorders such as major depression, ‎schizophrenia and etc. Despite of safety and usefulness of this method, the patients and ‎families have a great deal of stress about it. This stressful approach caused negative attitudes ‎and difficulties ...

متن کامل

EFFECTS OF DESMOPRESSIN ON MEMORY DISORDERS DUE TO ELECTROCONVULSIVE THERAPY (ECT) IN HUMANS

Electroconvulsive therapy (ECT) is an efficient treatment for several neuropsychiatric disorders however a large number of patients develop memory impairment after ECT. Different studies both on animals and human suggest that vasopressin has positive effects on memory and improves cognitive functions. In this randomized, double-blind controlled clinical trial, 50 patients with psychiatric d...

متن کامل

Portfolio Performance Evaluation in a Modified Mean-Variance-Skewness Framework with Negative Data

   The present study is an attempt toward evaluating the performance of portfolios using mean-variance-skewness model with negative data. Mean-variance non-linear framework and mean-variance-skewness non- linear framework had been proposed based on Data Envelopment Analysis, which the variance of the assets had been used as an input to the DEA and expected return and skewness were the output. C...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998